ZNF217 is a krupple-like is an oncogenic protein found in a chromosomal region frequently amplified in various cancers including breast cancer (Cohen et al. (2015)). It is a known biomarker of poor prognosis and is found in 20-30% primary human breast cancer cases (Littlepage et al. (2012)).
It belongs to the krupple-like family of transcription fators and is a transcriptional regulator invovled in both repressor and activating transcription complexes (Cohen et al. (2015). Overexpression of ZNF217 drives tumorigenic behavior such as impaired differentiation, self-renewal, motility, metastasis and impaired cell death.
While it is a known biomarker, understanding the downstream network it regulates is still ongoing. This study seeks to identify its gene targets through siRNA knockdown of ZNF217 in Oestrogen Receptor Positive (ER+) HER2-MCF7 cells with paired scrambled control. The experiment was done in triplicates and then total RNA was sequenced.
The dataset was obtained from recount3 (SRA ascession number: SRP043043) and DEGs were identified through the R package, DEseq2. The cutoffs used were log2 |fold change| >= 1 and p-value <0.05.
Following which, gene set enrichment analysis (GSEA) was performed with the clusterProfiler package in R. Top pathways were identified by the normalized enrichment score (NES).
In PCA, a clustering method is applied to essentially check if the treatment condition is the main variable responsible for variation in the data and to check if our replicates within each condition is similar. From the figure below, when separated by their treatment condition on PC1, the samples notably clusters by treatment condition and tightly within their treatment groups. We can see that the treatment condition accounts for 93% (PC1) of the variation and there is minimal variation between technical replicates (5%; PC2) which is ideal.
Fig.1: PCA of samples
A volcano plot is typically used to quickly visualize the relationship between the p-value (or statisitcal significance) and log2 fold change in expression at the gene level. Using abritrary parameters ie. fold-change and p-value cut offs, we can visualize the proportion of genes that are signifcant genes and are up or down regulated.
In the figure below, we can see that in there is more significantly upregulated genes than downregulated genes in the siZNF217 samples compared to paired control. In total, there were 571 DEGs, with 495 upregualted and 76 downregulated genes.
## Warning: One or more p-values is 0. Converting to 10^-1 * current lowest non-
## zero p-value...
Fig.2: Volcano Plot of Differentially Expressed Genes in Treated vs Untreated Samples
Through a heatmap we can visualize the differences in gene expression between samples of different treatment conditions and observe for conistency within the treatment group. In the figure below, a significant number of genes are upregulated in the knockdown sample when compared to the control.
Fig.3: Heatmap of differentially expressed genes
We then looked at the top 20 over and under expressed genes, visuazlied through a heatmap. Interestingly, the genes that were overexpressed are involved in keratonicyte differentiation, epidermal differentiation, and cell signalling. In particular, SCEL is invovled in terminal keratinocyte differentiation, FLG is involved in epidermal differentiation, ATP10D is involved in transport pathways, PDE7B and TLR1 is involved in cell sginalling [3-7].
Interestingly, PART1 a lncRNA reported to promote breast cancer progression through promoting proliferative and metastatic behaviors, was also upregulated.
Gene functions were retrieved from the GeneCards suite (Stelzer et al. (2016)).
Fig.4: Top 20 Over Expressed Genes
Fig.5: Top 20 Under Expressed Genes
Since small changes in a single gene expression can have cascading effects, it would be relevant to look at changes in entire pathways to provide a more holistic view. This is done through GSEA analysis.
## Warning in fgsea(pathways = geneSets, stats = geneList, nperm = nPerm, minSize = minGSSize, : There are ties in the preranked stats (27.85% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.
The gene-set used was Gene Ontology from MSigDB. The top 5 over expressed pathways enriched were identified by the highest NES scores while the top 5 under expressed pathways were identified by the lowest NES scores. The p-value cut-off for significance was 0.25. The top overexpressed pathways were consistent with the top over expressed DEGs.
Fig.6: Top 5 Over Expressed Pathways
Fig.7: Top 5 Under Expressed Pathways
Overexpression of genes and pathways relating to keratinocyte and epidermal differentiation in siZNF217 compared to control is unsurprising since ZNF217 is known to suppress cell differentiation. Additionally, GSEA analysis also showed overexpression of cornification pathway in the knockdown cells, which is programmed cell death specifically in the epidermis. The results suggest strongly that ZNF217 contributes to tumorigenesis by suppressing important cell differentiation and programmed cell death processes.
Additionally, underexpression of genes and pathways relating to cell proliferation suggest that ZNF217 promnoted tumour growth by way of proliferation.
Moving forward, selected genes should be validated using RT-qPCR. Genes promoting epidermal/keratinocyte differentiation could be possible therapeutic targets and can be further investigated via in vitro assays.